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Temporal communities are the result of a consistent partitioning of nodes across multiple snapshots of an 
evolving network, and they provide insights into how dense clusters in a network emerge, combine, split and 
decay over time. To reliably detect temporal communities we need to not only find a good community 
partition in a given snapshot but also ensure that it bears some similarity to the partition(s) found in the 
previous snapshot (s), a particularly difficult task given the extreme sensitivity of community structure 
yielded by current methods to changes in the network structure. Here, motivated by the inertia of inter-node 
relationships, we present a new measure of partition distance called estrangement, and show that 
constraining estrangement enables one to find meaningful temporal communities at various degrees of 
temporal smoothness in diverse real-world datasets. Estrangement confinement thus provides a principled 
approach to uncovering temporal communities in evolving networks. 

Community detection has been shown to reveal latent yet meaningful structure in networks such as groups 
in online and contact-based social networks, functional modules in protein-protein interaction networks, 
groups of customers with similar interests at online retailers, disciplinary groups of scientists in collab- 
oration networks, etc. 1 . Temporal community detection aims to find how such communities emerge, grow, 
combine and decay in networks that evolve with time. Temporal communities can provide robust network-based 
insights into complex phenomena such as the evolution of inter-country trade networks, the emergence of 
celebrities in social media, the formation of distinct political ideologies, the spread of epidemics, trends in venture 
investment, etc. 

Static community detection 1 partitions a network into groups of nodes such that the intra-group edge density is 
higher than the inter-group edge density. A partition can be specified by labels assigned to the nodes in the 
network, and a group of nodes with the same label constitutes a community. Methods used to discover com- 
munities in static networks find a partition of nodes which optimizes some quality (objective) function that 
quantifies how community-like the partition is. For time-varying networks, given time snapshots, temporal 
community detection assigns labels to nodes in each snapshot, and the set of {node, time} pairs that get the same 
label constitutes a temporal community. We define a temporal community structure as a partitioning of the {node, 
time} pairs over all snapshots that optimizes an appropriate quality function. We focus on the sequential version of 
the temporal community detection problem, where one is allowed to do computations only on the current 
snapshot while using limited information from the past. Sequential methods are useful in situations where the 
number of snapshots is large, or fast computation of temporal communities is important as new snapshots 
become available. 

A popular approach to detecting temporal communities is to find static communities independently in each 
snapshot using some quality function and then "map" communities between snapshots to preserve labels when 
possible. Examples of this approach include the map-equation method 2 and the clique percolation method 3 . 
However, these methods do not explicitly use the partitions found in past snapshots to inform the search for the 
optimal partition on the current snapshot. We argue (and show empirically in Results) that mapping indepen- 
dently detected communities is likely to miss crucial temporal communities as most quality functions used for 
static community detection are highly degenerate and extremely sensitive to changes in the network. This has 
been demonstrated specifically for modularity 4 , one of the earliest proposed and still commonly used quality 
functions, though several others have subsequently been introduced. Good et aV show that for many real- world 
networks, the modularity landscape is highly degenerate and disordered with numerous partitions yielding similar 



SCIENTIFIC REPORTS | 2 : 794 | DOI: 1 0.1 038/srep00794 



1 




Figure 1 | An example illustrating the detection of temporal 
communities via estrangement confinement. The network on the left, 
G t _i consists of 20 nodes and 52 links, and a maximal modularity partition 
of this network consists of three communities represented by the three 
colors (Q = 0.52). In the next snapshot, the network has evolved to G t 
which differs from G t _\ only in the absence of a single link, indicated by the 
dotted line. The top right and bottom right networks both represent the 
same network but indicate distinct choices of community partitions 
available. The partition shown on the top right, P[ consists of 4 
communities, and is the partition that gives the highest modularity 
Q[ = 0.5296. The partition P f 2 for G t shown on the bottom right which 
preserves the node partition chosen for G t _i has a slightly lower modularity 
of Q2 = 0.5248. The partition P\ with higher modularity, however, makes 7 
links estranged. The estranged links (shown in gray) are those intra- 
community links at t— 1 that change to inter- community links at t. Notice 
that links in the orange community of P[ despite having changed their 
community affiliation from t - 1 to t are not estranged since they are still 
intra- community links. In contrast to P[, the partition P\ yields no 
estranged links. Estrangement, £, defined as the fraction of estranged links 
at f is therefore 0 for P\ but 7/51 = 0.13 for P[. Maximizing modularity 
while constraining estrangement to a low value (e.g. 0.05) therefore yields 
P\ as the partition for G p yielding a smoother temporal progression of the 
community structure from t - 1 to t. 

values of modularity and constituting distinct local maxima. 
Importantly, they also show that other community quality functions 
are also likely to have degenerate quality functions. Moreover, the 
quality function landscape is highly sensitive to changes in the net- 
work, as shown by Karrer et al. 6 for modularity on several synthetic 
and real networks. Sensitivity implies it is very likely that a rather 
distinct community structure is detected even when the network 
changes slightly, which, when coupled with the degeneracy of the 
quality function landscape, makes consistent mapping of indepen- 
dently detected communities across snapshots very difficult. 

To counter these challenges, it is important to use the past com- 
munity structure when searching for good partitions in the current 
snapshot to maintain some temporal contiguity between subsequent 
partitions. Obviously, independent maximization of modularity (or 
some other quality function) on each snapshot has no incentive to 
maintain such a temporal contiguity between partitions. Also, the 
naive approach of initializing the search for a good partition of the 
current snapshot at the preceding snapshot's optimal partition has 
the serious drawback in that it fails to detect the birth of new com- 
munities (unless a significant number of new nodes are added) since 
most partition search methods decrease or keep constant the number 
of communities found. 



We propose a principled approach to find meaningful temporal 
communities that limits the search for near-optimal partitions to 
those partitions in the current snapshot that bear some similarity 
to the partitions found in previous snapshots. One of the key chal- 
lenges is to find a measure of this partition- similarity (or distance) 
that is appropriate for comparing partitions of different snapshots 
of an evolving network. None of the existing measures of partition 
distance, such as Variation of Information (VI) 5 , are suitable for 
comparing partitions of nodes in distinct snapshots because they 
do not consider edges, and therefore cannot account for changes in 
network structure. In particular, we require a measure that is tolerant 
of differences in partitions when the network has changed signifi- 
cantly but penalizes dissimilar partitioning when there are only 
minor changes in the network. 

We present a novel measure of partition distance, called estrange- 
ment, which quantifies the extent to which neighbors continue to share 
community affiliation. This is motivated by the empirical ob- 
servation that it is some form of social inertia inherent to group affili- 
ation choices that prevents the community structure from changing 
abruptly 7,8 . The estrangement between two time-ordered snapshots is 
defined as the fraction of edges that stop sharing their community 
affiliation with time. In other words, estrangement is the fraction of 
intra-community edges that become inter-community edges as the 
network evolves to the subsequent snapshot, as illustrated in Fig. 1. 

Our method of detecting temporal communities consists of max- 
imizing modularity in a snapshot subject to a constraint on the 
estrangement from the discovered partition in the previous snapshot. 
The amount of estrangement allowed controls the smoothness of the 
evolution of temporal communities, and varying it reveals various 
levels of resolution of temporal evolution of the network. The estran- 
gement constrained modularity maximization problem described 
above is at least as hard as modularity maximization which is NP- 
complete 9 . Moreover, known heuristic methods for unconstrained 
modularity maximization are not directly applicable to the con- 
strained version. However, we show that the dual problem con- 
structed using Lagrangian relaxation can be tackled by adapting 
techniques used for unconstrained modularity maximization, specif- 
ically a version of the Label Propagation Algorithm (LPA) 10 ' 11 . 

Some recent proposals for detecting temporal communities, simi- 
larly to ours, use the past community structure. Mucha et al. 12 extend 
the notion of random walk stability, introduced by Lambiotte et al. 13 , 
to mutli-slice networks and show that optimization of this stability 
yields coherent temporal communities (Incidentally, estrangement 
can be interpreted as temporal stability as we show in SI). However, 
their method is not sequential as it requires all slices (snapshots) to be 
aggregated into a stacked graph by introducing arbitrary weighted 
links between node copies in different slices. No principled method is 
presented for picking the weights of the inter-slice links. Our method 
is closest to evolutionary clustering introduced by Chakrabarti et al. 14 
where the quality of a community partition is measured by a com- 
bination of its snapshot cost and its temporal cost. However, unlike 
our method, the work of Chakrabarti et al 14 , does not prescribe 
specific relative contributions of the two costs, or demonstrate the 
effect of varying these contributions. Furthermore, the partition dis- 
tance measure and the optimization techniques we use are different. 
GraphScope 15 finds temporal communities by breaking the sequence 
of graph snapshots into graph segments and finding good communit- 
ies within each graph segment such that the total cost of encoding the 
sequence of graphs is minimized. However, it can only be used on 
unweighted networks. Subsequent techniques such as FacetNet 16 and 
MetaFac 17 apply the evolutionary clustering approach to partitions 
derived from a generative mixture model approximation of the net- 
work adjacency matrix. A distinctive drawback of generative models 
in the context of community detection is the necessity of providing a 
priori, the number of communities in the network, or using com- 
munity quality function based methods to find the most suitable 
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Figure 2 | Mapping community labels from time t - 1 to time t. The left 
panel shows the situation after estrangement confinement has found a 
partition of the graph at time t, consisting of three communities. Two of 
these have arisen due to an uneven split of the red community at t— 1, and 
one due to the merging of the blue and green communities at time t- 1. The 
mapping procedure causes fewest nodes to change labels from t— 1 to t. The 
center panel shows the bipartite construction that the mapping procedure 
uses. Here, nodes on the left (set U) represent communities at t- 1 and 
nodes on the right (set V) represent those at time t. Each node in U has an 
outgoing link to the node in y with whom its Jaccard overlap is maximal. 
Similarly each node in y has an outgoing link to the node in U with whom 
its Jaccard overlap is maximal. For simplicity, we say that each node points 
to its maximal overlap partner in the other set. Once these links are drawn, 
the mapping procedure allows inheritance of labels only between pairs of 
nodes which have bidirectional links between them, i.e., a node in U 
(community at t— 1) passes on its label to a node in V (community at t) 
only if they are maximal overlap partners of each other. Conseqeuntly, a 
node in [/which is not bidirectionally connected to any node in V, does not 
pass on its label (e.g., the green node in U). Similarly, a node in V which is 
not bidirectionally connected to any node in U, does not inherit a label, and 
therefore obtains a new label (e.g., the topmost node in V). The progression 
of appropriately labeled communities from t- 1 to t after the mapping step 
is shown in the panel on the right. 

number of communities a posteriori. Also, these modeling tech- 
niques assume that the networks are generated by a given stochastic 
data model. However, as argued by Breiman 18 , the utility of such 
techniques is limited by the accuracy of the models which are gen- 
erally difficult to design for complex networks. In contrast, our 
approach is based on empirically observed social inertia in commun- 
ity affiliation and does not try to model the possibly complex evolu- 
tion of the network itself. Thus, in summary, our method is 
sequential, does not need any generative model for network structure 
or evolution, and is applicable to both weighted and unweighted 
networks. 

Results 

Our key results include the definition of the novel partition distance 
measure of estrangement, a formulation of the problem of finding 
temporal communities as a constrained optimization problem, an 
efficient agglomerative method to solve the problem that relies crit- 
ically on the locally decomposable nature of estrangement, and an 
analysis of the temporal communities found by our method in vari- 
ous synthetic and real complex networks. 

Problem formulation. Given network snapshots G t _ l5 G t and the 
partition P t - X that represents the community structure at time t — 1, 
find a partition P t of G t that solves the following constrained 
optimization problem: 

maximize Q(V) 

V (1) 
subject to E(V)<3. 



Here Q is a quality function for the community structure in a snap- 
shot, V denotes the space of all partitions, and E is a measure of 
distance or dissimilarity between the community structure at times 
t and t — 1. The formulation above is based on the intuition that 
temporal communities can be detected by optimizing for quality in 
the current snapshot while ensuring that the distance from the past 
community structure is limited to a certain amount, as specified by 
the parameter 5. Smaller values of 5 imply greater emphasis on 
temporal contiguity whereas larger values of 5 place greater focus 
on finding better instantaneous community structure. Hence, we 
refer to 5 as the temporal divergence, or simply divergence. We 
emphasize that our formulation is independent of the specific com- 
munity structure quality function used. In this paper, we use mod- 
ularity 4 , a widely studied and tested quality function, which is defined 
as: 

u,v \ / 

where A is the adjacency matrix for the network, k x is the degree of 
node x, l x is the label assigned to x in this partition, and M is the total 
number of edges in the network. 5(i, j) is 1 if and only if i = j, and 0 
otherwise. Here a partition P is specified by the labels {li,h, • • • Jn} 
assigned to the nodes. Modularity has also been generalized to 
weighted networks 19 . 

For measuring partition distance, we use our novel measure of 
estrangement which we now define precisely. Given network snap- 
shots G t - 1, G t and partitions P t - x and P t , an edge (u, v) in G t is said to 
be estranged if l u # l v in P ti given that u and v were neighbors in G t - Y 
and l u = l v in P t -i. Estrangement is now defined as the fraction of 
estranged edges in G t . Note that equality of labels is required only 
within partitions, not across partitions. Estrangement can be written 
as: 

2M [ ' 

where Z uv = 3 (V~ 1 1 ) yjA*~ 1 A^ v , and A t - X and A t are the adja- 
cency matrices of G t -i and A t respectively. The square root term 
ensures that the definition applies to weighted networks as well, 
where M is taken to be the sum of the weights of all the edges in 
the network. Specifically, the term ^A t ~ l A t uv implies that if the 
weight of an edge whose endpoints continue to share labels changes 
from time t— 1 to we take the geometric mean of the weights when 
computing the partition distance. Estrangement can take values 
between 0 and 1, with 0 estrangement implying maximum possible 
similarity between the community structure in the two snapshots of 
the network and a value of 1 implying maximum possible dissim- 
ilarity. 

Duality based optimization approach. Greedy local optimization 
methods used for modularity maximization cannot be directly used 
to solve the constrained optimization problem in Eq. 1, since the 
space of solutions is now confined to the set of partitions which 
respect the constraint. We use the Lagrangian duality approach for 
constrained optimization. Henceforth, for notational simplicity, 
unless otherwise stated, all quantities of interest are with respect to 
the current snapshot G t . Following the dual formulation 20 , we write 
the Lagrangian L and the Lagrange dual function g corresponding to 
the primal problem (Eq. 1) as: 

L(V,X) = Q-X(E-d) 

g(l)= sup L(P,X) (4) 
v 
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Figure 3 | (a) An example of ground truth temporal communities that are used to generate evolving synthetic networks according to the markovian 
evolution shown in (b) and explained below. We use an impressionistic visualization to show the temporal communities, that we refer to as the evolution 
chart, in which nodes are shown along the Y axis and the snapshot number along the X axis. Each "pixel" in the evolution chart corresponds to a particular 
node at a given time, and the color represents the community of that node at that time t. The left panel shows a network consisting of n = 50 nodes with 
two temporal communities consisting of 20 nodes each. The first community arises from the markovian evolution of nodes 0-19 over the first 10 
snapshots, while the second arises from the markovian evolution of nodes 30 - 49 over the last 10 snapshots, (b) Schematic of Markovian evolution for 
intra- community edges, where an edge that exists in the current snapshot disappears with probability p in the subsequent snapshot, while a non-existing 
edge appears with probability q which is chosen such that the group density p c is preserved. 



where X is the Lagrange multiplier. For every value of X, the function 
g(X) yields an upper bound to the optimal value Q* of the primal 
problem. We are interested in the value of X that yields the smallest 
upper bound, which would in turn give us the best estimate of Q* 
subject to the constraint on E. This dual problem corresponding to 
the primal problem in Eq. 1 is: 

minimize g(X) 
subject to X>0 

If the minimum of g(k) occurs at the optimal partition for a 
given snapshot is one that yields the supremum of L(V,^*) over all 
partitions. 

Solving the dual problem to find the best partition requires com- 
puting the Lagrange dual function g{X), which itself involves a max- 
imization. We show that the Lagrange dual can be computed by 
adapting known methods for unconstrained modularity maximiza- 
tion. We introduce a hierarchical version of LPA 10 , which we refer to 
as HLPA, and which works by greedily merging communities that 
provide the largest gain in the objective function, and then repeating 
the procedure on an induced graph in which the communities from 
the previous steps are the nodes. In general, variants of LPA can be 



constructed by modifying the local objective function that the label 
update is maximizing. Barber and Clark 10 propose one such variant, 
LP Am, for modularity maximization. We construct the label update 
rule for HLPA in a similar vein for the optimization problem given by 
Eq. 4. Recall that a partition P is specified by the labels {/ l5 1 2 , . . ., In) 
assigned to the nodes. Then, in HLPA, each node x updates its label l x 
following the rule: 

l x = arg max (n x1 - k -^ + ^-5(U) + XO x ^j , (6) 

where N xl = ^2 U A UX 3(IJ), O xl = ^ X Z UX 3(U) and K t = 
Ylu kud(luJ)- Here O x \ is the extra term that arises due to the con- 
straint on E. We show in Methods that the above update rule con- 
verges to a local optimum of L(V,X) , and also that the optimization of 
L is further improved by the additional hierarchical procedure pre- 
sent in HLPA. We note that HLPA works well for optimizing L(V,X) 
because estrangement, similarly to modularity, can be decomposed 
into node-local terms which allows L to be optimized by each node 
updating its label based on those in its neighborhood. 

Once the Lagrange dual has been computed, we solve the dual 
problem (Eq. 5) of finding the best Lagrange multiplier by using 
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Figure 4 | Estrangement confined modularity maximization allows detection of temporal communities in the benchmark network presented in Fig. 3. 

The markovian evolution parameters (described in text) for this network are set to p = 0.6 and q = pp c /( l-p c ) = 0.4, which preserve the edge density (p c 
= 0.4) within a dense group (unless it ceases to exist). The density of connections in the random network background is p r = 0.05. The ability to detect the 
temporal communities diminishes as the constraint on estrangement is relaxed (3 — » 1) with the poorest results obtained for independent modularity 
maximization (3 = 1). 



Brent's method 24 which is commonly used for non-differentiable 
objective functions (see Methods). 

The optimization procedure and the HLPA update rule presented 
above apply to weighted networks as well by considering k u to be the 
strength of node u instead of the degree, where strength is defined as 
the sum of the weights of adjacent edges, and by considering M to be 
sum of the weights of all the edges in network. 

Finally, after the best community partition for G t has been found, 
we need to find an appropriate mapping of communities at time t to 
those found at time t — 1. We use a mutually maximal matching 
procedure illustrated in Fig. 2. Specifically, we map those communit- 
ies across two consecutive snapshots that have the maximal mutual 
Jaccard overlap between their constituent node-sets (Jaccard similar- 
ity of two sets is defined as the size of their intersection set divided by 
the size of their union), and generate new identifiers only when 
needed. 

Temporal communities in synthetic and empirical networks. Next, 
we apply the estrangement confinement method to synthetic and real 
networks and show the temporal communities obtained by varying 
the temporal divergence allowed, and their relation to ground-truth or 
meta-data where available. 

We start by describing our method to generate realistic synthetic 
benchmarks for testing temporal communities. Given a target tem- 
poral community structure, we generate a snapshot sequence consis- 
ting of dense groups (corresponding to the communities) embedded 
in a random background, with links in the dense groups undergoing 
markovian evolution and thus giving rise to a temporal community 
that persists over some period of time. An example target tempo- 
ral community structure is shown in Fig. 3 which consists of two 



temporal communities of 20 nodes each that exist for the first 10 and 
the last 10 snapshots respectively in 25 snapshots of a 50 node net- 
work. Each of the remaining nodes is a community by itself which 
lasts for exactly one snapshot, or equivalently, does not belong to any 
temporal community. 

The initial snapshot in the synthetic networks consists of an 
instance of an Erdos-Renyi random graph (ER(n, p r )) y among the n 
nodes where any edge exists independently with probability p r , and 
intra-community edges exist with an additional probability of p c 
(over the background probability of p r ). Subsequent snapshots are 
generated by first creating a new random instantiation of (ER(n,p r )), 
and enforcing a markovian evolution for the edges within a tempo- 
ral community while it exists in the target temporal community 
structure. Specifically, an edge that exists in the current snapshot 
disappears with probability p in the subsequent snapshot, while a 
non-existing edge appears with probability q (Fig. 3). The markovian 
evolution thus gives rise to a temporal community that persists over 
some period of time depending on the values ofp and q chosen, since 
these parameters control the edge density within the community. For 

the choice q = the initial edge density within the community 

is preserved in the subsequent evolution. Using this prescription, we 
generate different sequences of network evolution for the ground 
truth temporal communities shown in Fig. 3 by varying p c and p 
and settings = 0.05. 

In Fig. 4, we show the evolution chart of our results on the above 
synthetic networks withp c = 0.4 andp = 0.6. This implies that the 
average density of edges inside the dense groups is 0.4 and 60% of the 
edges change in each snapshot. For low enough values of 3, our 
method is able to detect the temporal communities, even in this 
rapidly evolving network. Independent modularity maximization 
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Figure 5 | Effect of varying the temporal divergence on the distance from the ground truth for benchmark evolving network described in Fig. 4. The 

average loss in modularity (over all snapshots) due to the estrangement constraint decreases as the constraint is relaxed. Estrangement, on the other hand, 
increases as the constraint is relaxed. The distance of the obtained temporal partition (labeled on the right y axis) from the benchmark ground truth as 
quantified by VI is lowest around 3 = 0.05. The range of this distance varies roughly between 2.5 and 3. 1 (labeled on the right y axis) as 3 is varied, which is 
significant since VI is a logarithmic measure (see Methods). Thus, by varying the constraint on estrangement we get a different set of temporal 
communities which could all be meaningful. The average loss in modularity relative to the average gain in estrangement indicates the range of values of 3 
which might yield meaningful temporal communities. 



(which corresponds to 3 = 1) is unable to detect the temporal com- 
munities as shown in the rightmost panel in Fig. 4. For a more 
quantitative comparison of the detected temporal communities to 
those known to be present in the ground truth, we use VI which is a 
common metric to evaluate the distance between two partitions of a 
set 5 . A static community is a partitioning of the set of nodes of the 
network, while a temporal community is defined as a partition of the 
set of {node,time} pairs. Thus using VI we can measure the distance 
of the partition of {node,time} pairs produced by a temporal com- 
munity detection algorithm from the partition defined in the ground 
truth shown in Fig. 3 (see Methods for details). 

In Fig. 5, we show the effect of varying 3 on the synthetic network 
used in Fig. 4. Low values of 3 yield low estrangement but also yield a 
lower value of modularity compared to what would result from 
unconstrained modularity maximization. Thus, reduction in 
estrangement comes at the expense of modularity. There appears 
to be no "correct" value of 3 for obtaining a meaningful structure, 
but in-practice very low values of 3 (0.05 or less) provide smooth 
communities. Fig. 5 shows that VI is about 0.5 lower (which is sub- 
stantial since VI is a logarithmic measure) for low values of 3 than the 
maximum VI seen. Despite fluctuations in VI values due to the 
stochasticity inherent in greedy optimization on partition space, 
the VI curve demonstrates clearly that significantly lower VI values 
(relative to the characteristic size of fluctuations) are achieved below 
some value of 3. It is difficult to estimate this threshold value of 3, but 
an empirical plot like Fig. 5 can provide insights into the range of 3 
values to which the detection can be restricted. A possible heuristic, 
that works well in practice, is choosing values of 3 lower than the 
point at which the average loss in modularity roughly equals the 
average estrangement. However, this is an ad-hoc prescription and 
a limitation of our method is that the desired smoothness is not 



determined a priori. Similar difficulties are also inherent in several 
other methods 12141617 . 

We now compare our method with other known methods on a 
series of synthetic networks generated by varying p c and p which 
corresponds to varying the density of edges inside a community 
and the rate at which they evolve, respectively. As shown in Fig. 6, 
our method consistently detects a temporal community structure 
that is most similar to the ground truth as compared to those found 
by the multislice modularity method 12 and independent modularity 
maximization in each snapshot along with label-mapping. For our 
method we pick the minimum value of VI that is achieved as 3 is 
varied between 0 and 0.1. A minimum of VI is usually attained at a 
value of 3 between 0.0 and 0.05. For the multislice method we pick 
the minimum VI achieved by varying the inter- slice coupling w 
between 0.05 and 1. We find that multislice modularity method finds 
the two communities, but is less adept at detecting the temporal 
variation, i.e., the birth and death of the large temporal communities 
for even small values of w. Furthermore, for even marginally high 
values of cl>, (e.g. 0.2), it finds large spurious temporal communities. 
The performance of all three methods improves with increase in 
intra-community edge density. The rate of change, p, has a noticeable 
effect on performance only for low values of p c . 

Having shown the performance of our method on a range of 
synthetic benchmarks, we next turn to the analysis of a real network: 
the human contact network data provided by the Reality- mining 
project 21 which tracked the mobility of about hundred individuals 
over nine months. A contact is registered when the Bluetooth devices 
being carried by the individuals come within 10 m of each other. The 
evolution chart in Fig. 7 shows the temporal communities resulting 
from applying estrangement confinement to snapshots created by 
aggregating contacts between individuals over a week (except over 
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Figure 6 | Estrangement confined modularity maximization (curves labeled Estrangement) consistently finds a temporal community structure that is 
most similar to the ground truth temporal community structure as compared to multislice modularity maximization (curves labeled Multislice) and 
independent modularity maximization within each snapshot (curves labeled Independent). Note that even for independent modularity maximization, 
we perform the label mapping step between successive partitions. The distance between the two temporal partitions is measured using VI (see text). The 
comparison is done on a range of evolving synthetic networks (constructed as described in Fig. 3(b)) obtained by varying the rate of change of intra- 
community edges, p y for different values of intra-community edge density, p c . For each benchmark we show the minimum VI obtained by Estrangement 
as the constraint 3 is varied and similarly the minimum VI obtained by Multislice as the inter- slice coupling is varied. 



vacation weeks in December) thus creating a weighted time evolving 
network, where in each snapshot the weight on an edge represents 
the number of contacts between the corresponding individuals. The 
nodes are ordered on the Y axis by the tuple of labels they take over 
time, where the labels in the tuple itself are ordered by the frequency 
of acquiring that label. Ties are broken by the time of first appearance 
of nodes. This ordering causes the nodes in a temporal community to 
appear contiguously. We illustrate communities and events that can 
be correlated with ground truth in Fig. 7. 

Finally, we analyze a time-evolving weighted network consisting of 
United States senators where the weight on an edge represents the 
similarity of their roll call voting behavior. The data was obtained 
from voteview.com and the similarities between a pair of senators was 
computed following Waugh et al. 22 as the number of bills on which 
they voted similarly, normalized by the number of bills they both 
voted on. The network consists of 1 1 1 snapshots corresponding to 
congresses over 220 years and 1916 unique senators. In Fig. 8, we 
show the evolution chart for 3 = 0.05, the value at which loss in 
modularity roughly equals the gain in estrangement. 

A broad feature that is observed for all values of temporal diver- 
gence is the emergence of two dominant voting communities with 
time. The party affiliation of the majority of the constituent nodes 
within these communities allows us to identify them as the tempo- 
ral streams which culminate in the present day Democratic and 
Republican parties (Fig. 8(a)). These features were previously 
observed by Mucha et al 12 . However, in contrast to their method, 
ours is sequential and does not need to construct and analyze the 
stacked network comprising of all snapshots. In addition to the dom- 
inant Democratic and Republican streams, we also detect two minor 
communities that consist of senators who predominantly vote in 
alignment with one of the two dominant communities, but have occa- 
sional switches to the other. One of these detected minor communities 
consists predominantly of Democratic members of the conservative 



coalition (Fig. 8(b)). The second minor community found consists of 
several moderate Democrats and left-leaning Republicans (Fig. 8(c)). 

Another feature we find is the reduction with time in the number 
of senators whose aggregating voting behavior over the duration of a 
congress are not aligned with the rest of their party. Fig. 8(d) shows 
the number of such "atypical" senators over time. Notice that after 
the year 1995, there is only one such senator detected by our method, 
whereas prior to 1995, a much larger number of senators voted 
differently from the bulk of their party. 

Discussion 

We have presented a novel approach to detect temporal communities 
based on a constrained optimization formulation. A critical piece of 
the formulation is the definition of estrangement, an effective mea- 
sure of partition distance between snapshots of a time-varying net- 
work that is motivated by the tendency of nodes to maintain 
similarity of community affiliations with their neighbors. The con- 
straint on estrangement allows us to pick solutions from the highly 
degenerate and sensitive modularity landscape that maintain tem- 
poral contiguity without compromising the current community 
structure. Our solution technique using Lagrangian duality relies 
on the fact that estrangement can be decomposed into local, single 
node terms. Our method operates on one snapshot at a time thus 
allowing us to compute temporal communities in a sequential man- 
ner, which is particularly useful for large networks. Notably, even if 
all snapshots are available to us in advance, estrangement provides a 
non-trivial but intuitive control parameter using which a broad range 
of temporal smoothness can be probed, potentially enabling com- 
munity discovery on many temporal scales. We demonstrate that 
meaningful temporal communities can be found by estrangement 
constrained modularity maximization. In particular, our demonstra- 
tions on empirical networks are corroborated by available ground 
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Figure 7 | Temporal communities seen in the reality mining network 21 . The network consists of two communities predominantly at lower values of 3 y 
one corresponding to staff and students at the MIT Media Lab (blue), and the other corresponding to students at the MIT Sloan School of Business 
(green). As 3 is increased the average size of the communities decreases and the number of communities increases, a consequence of the decreasing 
temporal contiguity. 



truth and by previous studies which used non-sequential methods to 
discover temporal communities. 

Several issues are worthy of further study. A limitation of our 
method is that it does not provide a specific prescription for choosing 
values of the constraint 3 that lead to meaningful temporal com- 
munities. Such a prescription will improve the utility of the method 
in practice. Another important issue is determining the granularity at 
which the time varying network is snapshotted. If the snapshots are 
made too frequently, there may not be enough density of edges to 
discover communities, whereas aggregating for too much time may 
prevent detection of some evolving patterns. In this work, we assume 
that there is a natural timescale of interest for creating snapshots, 
such as the one defined by biennial congressional elections in the case 
of the senator voting similarity network. In general, such natural 
timescales can perhaps be found by analyzing the frequency spec- 
trum of some relevant variable in the dataset 21 . A related issue is that 
of sporadic interruptions in data collection which could affect the 
calculation of estrangement as well as the mapping of communities 
between snapshots. The effect of interruptions can be mitigated by 
using a history of the extent to which nodes share community affilia- 
tions to compute estrangement. Also, estrangement is generalizable 
to the case of overlapping communities (see SI) which could reveal 
further interesting features in community evolution. 

Methods 

We describe our lagrangian duality based method for estrangement constrained 
optimization of modularity. As summarized in Results, we first need to compute the 



Lagrange dual function (Eq. 4), which we show can be computed by adapting known 
methods for unconstrained modularity maximization. The key to computing the dual 
lies in exploiting the property that estrangement is decomposable, similarly to 
modularity, into single node (or local) contributions. We utilize a hierarchical version 
of the Label Propagation Algorithm 10 to compute the dual. This method, which we 
refer to as HLPA, works by greedily merging communities that provide the largest 
gain in the objective function, and then repeating the procedure on an induced graph 
in which the communities from the previous steps are the nodes. Once this method of 
computing the Lagrange dual has been determined, we solve the dual problem of 
finding the best Lagrange multiplier by using Brent's method 24 which is commonly 
used for non-differentiable objective functions. We now present the above steps in 
greater detail. 

HLPA update rule for computing the Lagrange dual. We compute the Lagrange 
dual g(l) for a given /I, using HLPA in which each node x updates its community 
identifier (l x ) following the rule: 



= arg max 



k x Ki 
' ~2M 



(7) 



where N xl = E u A ux S(lu> I), O xl = E u ^ x Z ux 3(l w I) and K t = E u k u S(l w I). Here O xl is 
the extra term that arises due to the constraint on E. Next we show that this update 
rule indeed performs a greedy maximization of the Lagrangian. Following Barber and 
Clark 10 , we expand Q and write L(V,l) as: 



L{V,X)- 



1 

" 2M 



EE 



1 

" 2M 



k u k v + XZ uv )3{l u ,l v ) 



(8) 



Here we have taken advantage of the fact that the first term - 



2M 



- in E (Eq. 3) is 



independent of the partition and does not affect the optimization. To see the effect of a 
label update for a single node x, we separate terms of Eq. 8 into contributions from x 
and those from all other nodes. Doing so yields: 
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Figure 8 | The different temporal communities observed in the senator voting similarity network, (a) The dominant temporal communities can be 
identified with the modern day Republican (in red) and Democratic (in blue) parties, (b) and (c) Minor communities are also found, which consist of 
republicans and democrats who often differ in their voting behavior from the majority membership of their party, as well as independents, (d) The 
number of atypical senators defined as those whose temporal communities are different from their declared party affiliations. Our results indicate that the 
number of such senators has been decreasing with time, perhaps indicating increasing polarization. 
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S(U X ) 



where in the interest of brevity, we have introduced the shortened notation L to mean 
L(P,X). The first two terms in L (R.H.S. of Eq. 9) are unaffected by the label update of 
node x, so we focus on the last term. Since our goal is to greedily optimize L via label 
updates, we update the label of node x to one that results in the maximal gain in L. 
Thus, the desired post-update label is: 



l x = arg max JL ( 2 £ ( ( Aux - ^k u k x + XZ U \ j <>(/„,/) 



arg max (Afc- E^j* *«M(W) + E Z «*^W) 



(10) 



53 2M^"^^('"'') — 2M 2M 



and writing Z ux d(l u , I) as O x i (also, Ki = X u k u d(l u , /)), we see that Eq. 10 reduces 
to Eq. 7. It follows that the HLPA label update rule maximizes the gain in L. The 
optimization of L in HLPA is further improved by adopting an additional hierarchical 
step after the labels have converged to a local maximum of L. We detail this hier- 
archical procedure below. 

Hierarchical procedure in HLPA for computing the Lagrange dual. Once the 
sequence of label updates has converged on the original graph on which L is being 
maximized, we build a new induced graph which contains the communities of the 
original graph as nodes. Links between pairs of nodes in the new graph have weights 
equal to the total number of links between the two communities in the original graph 
that they correspond to. Then, L can be further increased by updating the labels of 
nodes in the induced graph iteratively, following Eq. 6. Importantly, this is possible 
only because L remains invariant in the transformation from the original graph to the 
induced graph (see SI). This alternating procedure of label updates followed by the 
induced graph transformation is recursively applied until we reach a hierarchical level 
where the converged value of L is lower than that obtained at the previous level. The 
partition found at the penultimate level before termination is chosen as the one 
optimizing L. This hierarchical procedure for optimizing L is similar in spirit to the 
one used in the Louvain algorithm 23 for optimizing Q. 



where we have used the fact that E u A ux S(l w I) is simply the number of neighbors of x 
with label /, which we denote by N xt . The diagonal terms (i.e. terms with u = x) in the 
remaining sums of the above equation do not have any bearing on the maximization 
and can be ignored. Then, using: 



Details on solving the dual problem. Having found a way to compute g(l) we can 
solve the dual problem and determine the value of X at which g{X) is minimized. The 
challenge here is that g(l) is not differentiable and moreover, it is expensive to 
evaluate. We use Brent's method which is often used to optimize non-differentiable 
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scalar functions within a given interval. In our case, g(X) is the scalar function, and we 
minimize it within a suitably large range of X. We use an implementation provided by 
python's scientific library, SciPy, in the form of scipy.optimize.fminbound(). For all 
experiments in this work, X min = 0 and X max =10. 

Furthermore, to mitigate issues due to the local nature of the algorithm and the 
degeneracy of the modularity landscape (and therefore the L{V,X) landscape), we 
perform several independent runs of HLPA for a given X and pick the run which yields 
the highest value of g(X). We perform at least 10 runs of HLPA as we start Brent's 
method and increase the number of runs by 10 with every iteration that narrows the 
search interval for X. Near the optimum value of X, we perform at least 150 runs to 
compute the Lagrange dual. For the synthetic benchmarks we increase the number of 
runs by 5 at every iteration. Once we identify the value X = X* for which g(X) is 
minimized, the partition which yielded g(X*), from among the many independent 
runs for X = /I*, is chosen as the optimal partition for the given snapshot. In practice, 
due to the degeneracy of the L(P, X) landscape for any X, we have to go slightly above 
/I* to ensure that the optimal partition lies within the feasible region. 

Our implementation is available at https://github.com/kawadia/estrangement. 

Comparing detected temporal communities with those in ground-truth: 
Variation of Information. We utilize Variation of Information (VI) ([5]) to quantify 
how far the temporal community partitions detected by the algorithms - 
estrangement confinement, multislice modularity maximation, independent 
modularity maximization - are from those that exist in the ground-truth. Given 
partitions P and P' of the set of {node,time} pairs, the VI between them is defined as: 

ieP,jeP f v J/ 

where n is the total number of {node,time} pairs, n { and n ; denote the number of 
{node,time} pairs in the temporal community i in P and the temporal community; in 
P' respectively, and tiy is the number of nodes common to both i in P and j in P\ 

The ground truth partition for our synthetic networks consists of two large tem- 
poral communities defined by the subsets of nodes having higher edge density and 
undergoing markovian evolution (Fig. 3). Each remaining {node,time} pair (which is 
not part of either temporal community) is assumed to be a temporal community by 
itself. The latter is perhaps an extreme assumption, but necessitated by the difficulty of 
appropriately defining ground truth communities within subsequent random 
instantiations of an Erdos-Renyi network. To alleviate the punitive nature of this 
definition, and to account for the fact that even within random graphs, communities 
consisting of more than one node may exist, we only consider those community pairs 
in the evaluation of VI for which at least one of the communities is of size greater than 
ten nodes. Thus, small communities of size greater than one but less than ten detected 
within the random background do not penalize VI despite not exactly corresponding 
to the ground truth. 

For purposes of comparison, we also run the multislice modularity maximization 
algorithm on the synthetic networks. This was done using code publicly available at: 
http://netwiki.amath.unc.edu/GenLouvain/GenLouvain. The results shown in Fig. 6 
are for the temporal community partitions with the lowest VI from the ground-truth 
obtained over values of co = 0.05, 0.1, 0.2, 0.4, 0.6, 0.8, 1.0, with 50 independent runs 
for each value of co. 
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